Skip to content

Conversation

masseyke
Copy link
Member

@masseyke masseyke commented Oct 7, 2025

This adds logic to SamplingService's clusterChanged method so a samples are removed from memory whenever the sampling configuration for that sample is deleted or modified.

@masseyke masseyke added >non-issue :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP v9.3.0 labels Oct 7, 2025
@masseyke masseyke marked this pull request as ready for review October 10, 2025 15:01
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@elasticsearchmachine elasticsearchmachine added the Team:Data Management Meta label for data/management team label Oct 10, 2025
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds logic to handle cleanup of cached samples when sampling configurations are deleted or modified in the SamplingService's clusterChanged method.

  • Implements cleanup logic in SamplingService.clusterChanged() to remove samples when their configurations are deleted or changed
  • Adds comprehensive test coverage for various configuration change scenarios

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
SamplingService.java Implements the core cleanup logic in clusterChanged method to remove samples when configurations are deleted or modified
SamplingServiceTests.java Adds comprehensive test case covering project deletion, metadata removal, configuration changes, and unchanged configuration scenarios

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

).collect(Collectors.toSet());
for (ProjectId projectId : allProjectIds) {
if (event.customMetadataChanged(projectId, SamplingMetadata.TYPE)) {
SamplingMetadata oldSamplingConfig = event.previousState().metadata().hasProject(projectId)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we rename this to oldSamplingMetadata and newSamplingMetadata? Otherwise it's easy to confuse this with the actual sampling configs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh good catch.

Copy link
Contributor

@nielsbauman nielsbauman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some drive-by comments, I hope you don't mind :)

Comment on lines 283 to 286
Set<ProjectId> allProjectIds = Stream.concat(
event.state().metadata().projects().values().stream().map(ProjectMetadata::id),
event.previousState().metadata().projects().values().stream().map(ProjectMetadata::id)
).collect(Collectors.toSet());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can just do this, right?

Suggested change
Set<ProjectId> allProjectIds = Stream.concat(
event.state().metadata().projects().values().stream().map(ProjectMetadata::id),
event.previousState().metadata().projects().values().stream().map(ProjectMetadata::id)
).collect(Collectors.toSet());
Set<ProjectId> allProjectIds = Sets.union(
event.previousState().metadata().projects().keySet(),
event.state().metadata().projects().keySet()
);

I'd personally be inclined to optimize that a bit more, because 99% of the time those two keysets will be equal, but that is only relevant when we have lots of projects, so I'm fine with labeling that as a premature optimization.

Copy link
Member Author

@masseyke masseyke Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha yeah you're right. I'll change that. I forgot that there was a metadata().projects() (even though I'm using it)!

Comment on lines 289 to 303
SamplingMetadata oldSamplingConfig = event.previousState().metadata().hasProject(projectId)
? event.previousState().projectState(projectId).metadata().custom(SamplingMetadata.TYPE)
: null;
SamplingMetadata newSamplingConfig = event.state().metadata().hasProject(projectId)
? event.state().projectState(projectId).metadata().custom(SamplingMetadata.TYPE)
: null;
Map<String, SamplingConfiguration> newSampleConfigsMap = newSamplingConfig == null
? Map.of()
: newSamplingConfig.getIndexToSamplingConfigMap();
Set<String> currentlyConfiguredIndexNames = newSampleConfigsMap.keySet();
Set<String> previouslyConfiguredIndexNames = oldSamplingConfig == null
? Set.of()
: oldSamplingConfig.getIndexToSamplingConfigMap().keySet();
Set<String> removedIndexNames = new HashSet<>(previouslyConfiguredIndexNames);
removedIndexNames.removeAll(currentlyConfiguredIndexNames);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more of an optional code styling suggestion, so feel free to ignore if you prefer your current implementation.

Suggested change
SamplingMetadata oldSamplingConfig = event.previousState().metadata().hasProject(projectId)
? event.previousState().projectState(projectId).metadata().custom(SamplingMetadata.TYPE)
: null;
SamplingMetadata newSamplingConfig = event.state().metadata().hasProject(projectId)
? event.state().projectState(projectId).metadata().custom(SamplingMetadata.TYPE)
: null;
Map<String, SamplingConfiguration> newSampleConfigsMap = newSamplingConfig == null
? Map.of()
: newSamplingConfig.getIndexToSamplingConfigMap();
Set<String> currentlyConfiguredIndexNames = newSampleConfigsMap.keySet();
Set<String> previouslyConfiguredIndexNames = oldSamplingConfig == null
? Set.of()
: oldSamplingConfig.getIndexToSamplingConfigMap().keySet();
Set<String> removedIndexNames = new HashSet<>(previouslyConfiguredIndexNames);
removedIndexNames.removeAll(currentlyConfiguredIndexNames);
Map<String, SamplingConfiguration> oldSampleConfigsMap = Optional.ofNullable(event.previousState().metadata().getProject(projectId))
.map(p -> p.custom(SamplingMetadata.TYPE))
.map(SamplingMetadata::getIndexToSamplingConfigMap)
.orElse(Map.of());
Map<String, SamplingConfiguration> newSampleConfigsMap = Optional.ofNullable(event.state().metadata().getProject(projectId))
.map(p -> p.custom(SamplingMetadata.TYPE))
.map(SamplingMetadata::getIndexToSamplingConfigMap)
.orElse(Map.of());
Set<String> removedIndexNames = new HashSet<>(oldSampleConfigsMap.keySet());
removedIndexNames.removeAll(newSampleConfigsMap.keySet());

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that too. Unfortunately though getProject() throws an exception rather than returning null if the project doesn't exist.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh but using projects().get(projectId) works fine. I'll switch to that.

event.previousState().metadata().projects().values().stream().map(ProjectMetadata::id)
).collect(Collectors.toSet());
for (ProjectId projectId : allProjectIds) {
if (event.customMetadataChanged(projectId, SamplingMetadata.TYPE)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're retrieving the SamplingMetadata from both projects below, it doesn't really make sense to do this customMetadataChanged here, as that gets the SamplingMetadata from both projects as well. We might as well just get the two customs below and check their equality here. What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That won't really save us anything other than a two hash map lookups will it? And it's possible that someone could optimize customMetadataChanged in the future and then we'd miss out on it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd save the project lookup and the customs lookup, both twice. I agree that that's not much. I'll leave it up to you 👍

@masseyke masseyke added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Oct 10, 2025
@elasticsearchmachine elasticsearchmachine merged commit 5de713f into elastic:main Oct 10, 2025
34 checks passed
@masseyke masseyke deleted the random-sampling-cluster-changed branch October 10, 2025 22:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >non-issue Team:Data Management Meta label for data/management team v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants